Independent two-sample t-test (t_test_independent)#
The independent (unpaired) two-sample t-test asks:
“Do these two independent groups plausibly come from populations with the same mean?”
It is used when you have:
a continuous outcome (interval/ratio scale)
two independent groups (different people/items; not paired / repeated measurements)
interest in a difference in means
This notebook covers:
Student’s t-test (assumes equal variances)
Welch’s t-test (does not assume equal variances; often the safer default)
Learning goals#
By the end you should be able to:
choose the right t-test variant (Student vs Welch)
write down \(H_0\) / \(H_1\) and pick one- vs two-sided alternatives
compute the t-statistic and degrees of freedom from scratch (NumPy)
understand what the p-value is (and what it is not)
interpret results using both p-values and confidence intervals
build intuition with simulations and Plotly visuals
import math
import platform
from dataclasses import dataclass
from typing import Literal
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
print("Python", platform.python_version())
print("NumPy", np.__version__)
# SciPy is optional: we only use it for cross-checks in a later section.
try:
import scipy
from scipy.stats import t as scipy_t
from scipy.stats import ttest_ind
HAS_SCIPY = True
print("SciPy", scipy.__version__)
except Exception as e:
HAS_SCIPY = False
print("SciPy not available:", repr(e))
Python 3.12.9
NumPy 1.26.2
SciPy 1.15.0
Prerequisites (quick)#
Math / stats:
sample mean and (unbiased) sample variance
the idea of a standard error (estimate of sampling uncertainty)
basic probability: “area under a curve”
Tools:
NumPy
Plotly (for visuals)
1) When to use the independent t-test#
Use it when:
you compare the means of a numeric variable between two independent groups
you can treat observations as independent within and across groups
the outcome is approximately normal within each group (or sample sizes are large enough for the CLT to help)
Do not use it when:
the same subject/item is measured twice → use a paired t-test
you have >2 groups → use ANOVA (or multiple-comparisons-corrected pairwise tests)
you care about medians / heavy-tailed distributions / strong outliers → consider Mann–Whitney U, a permutation test, or robust methods
2) Hypotheses and alternatives#
Let \(\mu_1\) and \(\mu_2\) be the population means for group 1 and group 2.
Most common (two-sided):
One-sided alternatives are also possible:
greater: \(H_1: \mu_1 - \mu_2 > 0\)
less: \(H_1: \mu_1 - \mu_2 < 0\)
Pick the alternative before looking at the data.
3) The test statistic (Student vs Welch)#
We observe samples:
group 1: \(x_1,\dots,x_{n_1}\)
group 2: \(y_1,\dots,y_{n_2}\)
Compute sample means:
and unbiased sample variances:
The difference in sample means is:
The t-test turns this into a signal-to-noise ratio:
where \(\mathrm{SE}(\Delta)\) is the standard error of the difference.
Student’s t-test (equal variances)#
Assumes \(\sigma_x^2 = \sigma_y^2\).
Pooled variance:
Standard error:
Degrees of freedom:
Welch’s t-test (unequal variances)#
Does not assume equal variances.
Standard error:
Approximate degrees of freedom (Welch–Satterthwaite):
From \(t\) to a p-value#
Under \(H_0\) (and assumptions), the statistic follows a Student t distribution:
The p-value is a tail probability under that reference distribution (two-sided example):
4) What the p-value means (precisely)#
A p-value is:
A probability computed under the null model.
The probability of getting a result at least as extreme as observed, if \(H_0\) were true.
A p-value is not:
the probability that \(H_0\) is true
the probability that the result happened “by chance” (without defining a model)
a measure of practical importance (effect size matters!)
5) NumPy-only implementation#
We’ll implement:
summary stats (\(ar{x}\), \(s^2\))
the t-statistic + degrees of freedom (Student or Welch)
the Student-t PDF
a numerical CDF via integration (needed for p-values and critical values)
Note: numerical integration is for learning; for production-grade accuracy/performance, prefer scipy.stats.
Alternative = Literal["two-sided", "less", "greater"]
def _as_1d_float(x, name: str) -> np.ndarray:
arr = np.asarray(x, dtype=float).reshape(-1)
if arr.size < 2:
raise ValueError(f"{name} must have at least 2 observations")
if np.any(~np.isfinite(arr)):
raise ValueError(f"{name} contains non-finite values")
return arr
@dataclass(frozen=True)
class TDistLookup:
df: float
t_max: float
xs: np.ndarray
cdf_pos: np.ndarray
def t_pdf(x: np.ndarray | float, df: float) -> np.ndarray:
'''Student-t PDF (NumPy-only).'''
x = np.asarray(x, dtype=float)
df = float(df)
if not np.isfinite(df) or df <= 0:
raise ValueError("df must be finite and > 0")
# pdf(x) = Gamma((df+1)/2) / (sqrt(df*pi) * Gamma(df/2)) * (1 + x^2/df)^(-(df+1)/2)
log_c = (
math.lgamma((df + 1.0) / 2.0)
- 0.5 * (math.log(df) + math.log(math.pi))
- math.lgamma(df / 2.0)
)
c = math.exp(log_c)
return c * (1.0 + (x * x) / df) ** (-(df + 1.0) / 2.0)
def make_t_lookup(df: float, *, t_max: float = 12.0, n: int = 60_000) -> TDistLookup:
'''Approximate the t CDF on [0, t_max] via cumulative trapezoid integration.'''
df = float(df)
t_max = float(t_max)
n = int(n)
if not np.isfinite(df) or df <= 0:
raise ValueError("df must be finite and > 0")
if not np.isfinite(t_max) or t_max <= 0:
raise ValueError("t_max must be finite and > 0")
if n < 1000:
raise ValueError("n must be >= 1000 for a reasonable CDF approximation")
xs = np.linspace(0.0, t_max, n + 1)
pdf = t_pdf(xs, df)
dx = float(xs[1] - xs[0])
# Integral from 0..xs[i] (for i>=1)
area = np.cumsum((pdf[:-1] + pdf[1:]) * 0.5 * dx)
cdf_pos = np.empty_like(xs)
cdf_pos[0] = 0.5
cdf_pos[1:] = 0.5 + area
cdf_pos = np.clip(cdf_pos, 0.0, 1.0)
return TDistLookup(df=df, t_max=t_max, xs=xs, cdf_pos=cdf_pos)
def t_cdf(t: np.ndarray | float, lookup: TDistLookup) -> np.ndarray:
'''Student-t CDF via interpolation from a precomputed lookup table.'''
t = np.asarray(t, dtype=float)
t_abs = np.abs(t)
cdf_abs = np.interp(t_abs, lookup.xs, lookup.cdf_pos, left=0.5, right=1.0)
return np.where(t >= 0, cdf_abs, 1.0 - cdf_abs)
def t_ppf(p: np.ndarray | float, lookup: TDistLookup) -> np.ndarray:
'''Inverse CDF (quantile) via interpolation.'''
p = np.asarray(p, dtype=float)
if np.any((p <= 0.0) | (p >= 1.0)):
raise ValueError("p must be in (0, 1)")
sign = np.where(p >= 0.5, 1.0, -1.0)
p_abs = np.where(p >= 0.5, p, 1.0 - p)
t_abs = np.interp(p_abs, lookup.cdf_pos, lookup.xs)
return sign * t_abs
def t_p_value(t_obs: float, lookup: TDistLookup, *, alternative: Alternative) -> float:
'''p-value for a t-statistic under t(df).'''
cdf = float(t_cdf(t_obs, lookup))
if alternative == "two-sided":
p = 2.0 * min(cdf, 1.0 - cdf)
elif alternative == "greater":
p = 1.0 - cdf
elif alternative == "less":
p = cdf
else:
raise ValueError("alternative must be one of: 'two-sided', 'less', 'greater'")
return float(np.clip(p, 0.0, 1.0))
@dataclass(frozen=True)
class IndependentTTestResult:
t: float
df: float
p_value: float
alternative: Alternative
equal_var: bool
n1: int
n2: int
mean1: float
mean2: float
var1: float
var2: float
diff: float
se: float
ci: tuple[float, float]
alpha: float
cohens_d: float
cohens_d_method: str
def t_test_independent_numpy(
x,
y,
*,
equal_var: bool = False,
alternative: Alternative = "two-sided",
alpha: float = 0.05,
t_max: float = 12.0,
lookup_n: int = 60_000,
) -> IndependentTTestResult:
'''Independent two-sample t-test (Student or Welch), using NumPy-only p-values.'''
x = _as_1d_float(x, "x")
y = _as_1d_float(y, "y")
n1, n2 = int(x.size), int(y.size)
mean1, mean2 = float(x.mean()), float(y.mean())
var1, var2 = float(x.var(ddof=1)), float(y.var(ddof=1))
diff = mean1 - mean2
if equal_var:
df = float(n1 + n2 - 2)
sp2 = ((n1 - 1) * var1 + (n2 - 1) * var2) / df
se = math.sqrt(sp2 * (1.0 / n1 + 1.0 / n2))
cohens_d = diff / math.sqrt(sp2)
cohens_d_method = "pooled SD"
else:
se = math.sqrt(var1 / n1 + var2 / n2)
df = float(
(var1 / n1 + var2 / n2) ** 2
/ ((var1 / n1) ** 2 / (n1 - 1) + (var2 / n2) ** 2 / (n2 - 1))
)
# A common fallback is to standardize by the average SD (not perfect, but useful for scale).
cohens_d = diff / math.sqrt(0.5 * (var1 + var2))
cohens_d_method = "average SD"
if se == 0.0:
raise ValueError("Standard error is zero; check data (all values identical?)")
t_stat = diff / se
if not (0.0 < alpha < 1.0):
raise ValueError("alpha must be in (0, 1)")
lookup = make_t_lookup(df, t_max=t_max, n=lookup_n)
p_value = t_p_value(t_stat, lookup, alternative=alternative)
t_crit = float(t_ppf(1.0 - alpha / 2.0, lookup)) # always two-sided CI
ci = (diff - t_crit * se, diff + t_crit * se)
return IndependentTTestResult(
t=float(t_stat),
df=float(df),
p_value=float(p_value),
alternative=alternative,
equal_var=bool(equal_var),
n1=n1,
n2=n2,
mean1=mean1,
mean2=mean2,
var1=var1,
var2=var2,
diff=float(diff),
se=float(se),
ci=(float(ci[0]), float(ci[1])),
alpha=float(alpha),
cohens_d=float(cohens_d),
cohens_d_method=cohens_d_method,
)
def format_result(r: IndependentTTestResult) -> str:
ci_level = int(round((1.0 - r.alpha) * 100))
return "\n".join(
[
f"t = {r.t:.4f} | df = {r.df:.2f} | p = {r.p_value:.6g} ({r.alternative}, equal_var={r.equal_var})",
f"mean1 = {r.mean1:.4f} (n={r.n1}), mean2 = {r.mean2:.4f} (n={r.n2})",
f"diff = mean1 - mean2 = {r.diff:.4f}, SE(diff) = {r.se:.4f}",
f"{ci_level}% CI for diff: [{r.ci[0]:.4f}, {r.ci[1]:.4f}]",
f"Cohen's d ({r.cohens_d_method}): {r.cohens_d:.4f}",
]
)
6) Worked example (synthetic data)#
We create two independent groups with a small mean difference.
Tip: In real analyses, build these arrays from your dataset after filtering into two groups.
n1, n2 = 25, 28
group1 = rng.normal(loc=0.0, scale=1.0, size=n1)
group2 = rng.normal(loc=0.5, scale=1.0, size=n2)
print("group1: mean=", float(group1.mean()), "sd=", float(group1.std(ddof=1)))
print("group2: mean=", float(group2.mean()), "sd=", float(group2.std(ddof=1)))
group1: mean= -0.035389987592057665 sd= 0.8310923822923664
group2: mean= 0.6753054649687451 sd= 0.7321879696862332
# Visualize the raw distributions
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=group1,
name="group1",
histnorm="probability density",
opacity=0.6,
marker_color="#1f77b4",
)
)
fig.add_trace(
go.Histogram(
x=group2,
name="group2",
histnorm="probability density",
opacity=0.6,
marker_color="#d62728",
)
)
m1 = float(group1.mean())
m2 = float(group2.mean())
fig.add_shape(type="line", x0=m1, x1=m1, y0=0, y1=1, yref="paper", line=dict(color="#1f77b4", dash="dash"))
fig.add_shape(type="line", x0=m2, x1=m2, y0=0, y1=1, yref="paper", line=dict(color="#d62728", dash="dash"))
fig.update_layout(
title="Independent groups: histogram (density) with sample means",
barmode="overlay",
xaxis_title="Value",
yaxis_title="Density",
)
fig.show()
fig2 = go.Figure()
fig2.add_trace(go.Violin(y=group1, name="group1", box_visible=True, meanline_visible=True, line_color="#1f77b4"))
fig2.add_trace(go.Violin(y=group2, name="group2", box_visible=True, meanline_visible=True, line_color="#d62728"))
fig2.update_layout(title="Violin + box plot", yaxis_title="Value")
fig2.show()
# Run the test (Student vs Welch)
res_student = t_test_independent_numpy(group1, group2, equal_var=True)
res_welch = t_test_independent_numpy(group1, group2, equal_var=False)
print("Student (equal variances):")
print(format_result(res_student))
print("\nWelch (unequal variances):")
print(format_result(res_welch))
Student (equal variances):
t = -3.3101 | df = 51.00 | p = 0.00171715 (two-sided, equal_var=True)
mean1 = -0.0354 (n=25), mean2 = 0.6753 (n=28)
diff = mean1 - mean2 = -0.7107, SE(diff) = 0.2147
95% CI for diff: [-1.1417, -0.2797]
Cohen's d (pooled SD): -0.9108
Welch (unequal variances):
t = -3.2861 | df = 48.21 | p = 0.00189875 (two-sided, equal_var=False)
mean1 = -0.0354 (n=25), mean2 = 0.6753 (n=28)
diff = mean1 - mean2 = -0.7107, SE(diff) = 0.2163
95% CI for diff: [-1.1455, -0.2759]
Cohen's d (average SD): -0.9074
def plot_two_sided_p_value_region(
t_obs: float,
df: float,
p_value: float,
*,
x_max: float = 6.0,
n: int = 2000,
) -> go.Figure:
xs = np.linspace(-x_max, x_max, n)
ys = t_pdf(xs, df)
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name=f"t(df={df:.2f})"))
t_abs = abs(float(t_obs))
xr = np.linspace(t_abs, x_max, 800)
yr = t_pdf(xr, df)
# Right tail polygon
fig.add_trace(
go.Scatter(
x=np.r_[xr, xr[::-1]],
y=np.r_[yr, np.zeros_like(yr)],
fill="toself",
fillcolor="rgba(214, 39, 40, 0.25)",
line=dict(color="rgba(214, 39, 40, 0.45)"),
hoverinfo="skip",
showlegend=False,
)
)
# Left tail polygon (mirror)
xl = -xr[::-1] # from -x_max .. -t_abs
yl = yr[::-1]
fig.add_trace(
go.Scatter(
x=np.r_[xl, xl[::-1]],
y=np.r_[yl, np.zeros_like(yl)],
fill="toself",
fillcolor="rgba(214, 39, 40, 0.25)",
line=dict(color="rgba(214, 39, 40, 0.45)"),
hoverinfo="skip",
showlegend=False,
)
)
y_top = float(np.max(ys))
fig.add_shape(
type="line",
x0=t_obs,
x1=t_obs,
y0=0.0,
y1=y_top,
line=dict(color="black", dash="dash"),
)
fig.add_annotation(
x=t_obs,
y=y_top * 0.9,
text=f"t = {t_obs:.3f}<br>p = {p_value:.3g}",
showarrow=True,
arrowhead=2,
ax=40,
ay=-40,
)
fig.update_layout(
title="Two-sided p-value = shaded tail area under t(df)",
xaxis_title="t",
yaxis_title="Density",
)
return fig
plot_two_sided_p_value_region(res_welch.t, res_welch.df, res_welch.p_value, x_max=6.0).show()
7) How to interpret the output#
A practical checklist:
Direction: the sign of
tmatches the sign ofmean1 - mean2.Statistical significance: compare
pto your threshold \(lpha\) (commonly 0.05).Confidence interval: a \((1-lpha)\) CI for \((\mu_1-\mu_2)\) that excludes 0 corresponds to rejecting \(H_0\) at level \(lpha\).
Practical importance: consider effect size (e.g. Cohen’s d) and whether the CI is meaningfully far from 0.
Assumptions: independence (design), distribution shape/outliers, variance structure.
for name, r in [("Student", res_student), ("Welch", res_welch)]:
includes_zero = r.ci[0] <= 0.0 <= r.ci[1]
print(f"{name}: CI includes 0? {includes_zero}")
Student: CI includes 0? False
Welch: CI includes 0? False
8) Assumptions, diagnostics, and pitfalls#
Independence (most important)#
The t-test assumes each observation is independent. This is usually a study design question (random sampling / random assignment).
Approximate normality (within each group)#
The t-test is fairly robust for moderate/large sample sizes.
With strong skew / heavy tails / outliers, the mean can behave badly.
Equal variances (only for Student’s t-test)#
If you are not comfortable assuming equal variances, use Welch’s t-test.
Outliers#
Because the test compares means, a single extreme value can move the mean and change the result.
# Outlier sensitivity demo: add a single extreme value to group2
outlier_value = 6.0
group2_outlier = np.concatenate([group2, [outlier_value]])
res_welch_outlier = t_test_independent_numpy(group1, group2_outlier, equal_var=False)
print("Welch (baseline):")
print(format_result(res_welch))
print("\nWelch (group2 with one added outlier):")
print(format_result(res_welch_outlier))
fig = make_subplots(rows=1, cols=2, subplot_titles=("Original data", "After adding an outlier to group2"))
fig.add_trace(
go.Histogram(
x=group1,
name="group1",
histnorm="probability density",
opacity=0.6,
marker_color="#1f77b4",
),
row=1,
col=1,
)
fig.add_trace(
go.Histogram(
x=group2,
name="group2",
histnorm="probability density",
opacity=0.6,
marker_color="#d62728",
),
row=1,
col=1,
)
fig.add_trace(
go.Histogram(
x=group1,
name="group1",
histnorm="probability density",
opacity=0.6,
marker_color="#1f77b4",
showlegend=False,
),
row=1,
col=2,
)
fig.add_trace(
go.Histogram(
x=group2_outlier,
name="group2 (+ outlier)",
histnorm="probability density",
opacity=0.6,
marker_color="#d62728",
),
row=1,
col=2,
)
fig.update_layout(
title="Outliers can change means (and therefore the t-test)",
barmode="overlay",
)
fig.update_xaxes(title_text="Value", row=1, col=1)
fig.update_xaxes(title_text="Value", row=1, col=2)
fig.update_yaxes(title_text="Density", row=1, col=1)
fig.update_yaxes(title_text="Density", row=1, col=2)
fig.show()
Welch (baseline):
t = -3.2861 | df = 48.21 | p = 0.00189875 (two-sided, equal_var=False)
mean1 = -0.0354 (n=25), mean2 = 0.6753 (n=28)
diff = mean1 - mean2 = -0.7107, SE(diff) = 0.2163
95% CI for diff: [-1.1455, -0.2759]
Cohen's d (average SD): -0.9074
Welch (group2 with one added outlier):
t = -3.1784 | df = 49.48 | p = 0.00255332 (two-sided, equal_var=False)
mean1 = -0.0354 (n=25), mean2 = 0.8589 (n=29)
diff = mean1 - mean2 = -0.8943, SE(diff) = 0.2814
95% CI for diff: [-1.4596, -0.3290]
Cohen's d (average SD): -0.8555
9) Simulation intuition: Type I error and power#
A good mental model:
Type I error (false positive rate): if \(H_0\) is true, how often do we reject at level \(lpha\)?
Power: if there is a real mean difference, how often do we reject?
Below we simulate many experiments and visualize:
the distribution of t-statistics under \(H_0\)
the distribution of p-values under \(H_0\) (should be roughly Uniform(0,1))
a simple power curve as the true mean difference grows
For speed we use Student’s t-test here (equal variances, fixed df).
def student_t_stats_matrix(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
'''Vectorized Student t-statistics for many simulated experiments.
X: shape (n_sims, n1)
Y: shape (n_sims, n2)
'''
n1 = X.shape[1]
n2 = Y.shape[1]
m1 = X.mean(axis=1)
m2 = Y.mean(axis=1)
v1 = X.var(axis=1, ddof=1)
v2 = Y.var(axis=1, ddof=1)
df = n1 + n2 - 2
sp2 = ((n1 - 1) * v1 + (n2 - 1) * v2) / df
se = np.sqrt(sp2 * (1.0 / n1 + 1.0 / n2))
return (m1 - m2) / se
alpha = 0.05
n_sims = 8000
n1 = n2 = 20
# Under H0: same mean, same variance
X0 = rng.normal(0.0, 1.0, size=(n_sims, n1))
Y0 = rng.normal(0.0, 1.0, size=(n_sims, n2))
df0 = n1 + n2 - 2
lookup0 = make_t_lookup(df0, t_max=12.0, n=80_000)
t_stats0 = student_t_stats_matrix(X0, Y0)
cdf0 = t_cdf(t_stats0, lookup0)
p_vals0 = 2.0 * np.minimum(cdf0, 1.0 - cdf0)
print("Empirical Type I error at alpha=0.05:", float(np.mean(p_vals0 < alpha)))
# Plot p-values under H0 (should be ~Uniform(0,1))
fig = go.Figure()
fig.add_trace(go.Histogram(x=p_vals0, nbinsx=40, histnorm="probability density", name="p-values"))
fig.add_trace(go.Scatter(x=[0, 1], y=[1, 1], mode="lines", name="Uniform(0,1) density=1"))
fig.update_layout(
title="p-values under H0 (should be roughly uniform)",
xaxis_title="p-value",
yaxis_title="Density",
)
fig.show()
# Plot t-statistics under H0 with theoretical t pdf
x_line = np.linspace(-6, 6, 1400)
pdf_line = t_pdf(x_line, df0)
fig = go.Figure()
fig.add_trace(go.Histogram(x=t_stats0, nbinsx=60, histnorm="probability density", name="simulated t"))
fig.add_trace(go.Scatter(x=x_line, y=pdf_line, mode="lines", name=f"t pdf (df={df0})"))
fig.update_layout(
title="t-statistics under H0 (simulation vs theoretical t density)",
xaxis_title="t",
yaxis_title="Density",
)
fig.show()
Empirical Type I error at alpha=0.05: 0.04875
# A simple power curve: how power increases as the true mean difference grows
deltas = np.linspace(0.0, 1.2, 9) # true |mu1 - mu2| when sigma=1
powers = []
for delta in deltas:
X = rng.normal(0.0, 1.0, size=(n_sims, n1))
Y = rng.normal(delta, 1.0, size=(n_sims, n2))
t_stats = student_t_stats_matrix(X, Y)
cdf = t_cdf(t_stats, lookup0)
p_vals = 2.0 * np.minimum(cdf, 1.0 - cdf)
powers.append(float(np.mean(p_vals < alpha)))
fig = go.Figure()
fig.add_trace(go.Scatter(x=deltas, y=powers, mode="lines+markers"))
fig.update_layout(
title="Power vs true mean difference (n1=n2=20, sigma=1, alpha=0.05)",
xaxis_title="True |mu1 - mu2|",
yaxis_title="Power (P(reject H0))",
)
fig.show()
10) Practical usage (SciPy cross-check)#
In real work, you typically use scipy.stats.ttest_ind:
equal_var=True→ Student’s t-testequal_var=False→ Welch’s t-test
Below we compare SciPy’s result to our NumPy-only implementation.
if HAS_SCIPY:
r_student = ttest_ind(group1, group2, equal_var=True)
r_welch = ttest_ind(group1, group2, equal_var=False)
print("SciPy Student:")
print(" t=", float(r_student.statistic), "p=", float(r_student.pvalue))
print("Our Student:")
print(" t=", res_student.t, "p=", res_student.p_value)
print("\nSciPy Welch:")
print(" t=", float(r_welch.statistic), "p=", float(r_welch.pvalue))
print("Our Welch:")
print(" t=", res_welch.t, "p=", res_welch.p_value)
else:
print("SciPy not installed; skipping cross-check.")
SciPy Student:
t= -3.3100619557330977 p= 0.00171714967615853
Our Student:
t= -3.3100619557330977 p= 0.0017171497826875548
SciPy Welch:
t= -3.286069304555434 p= 0.0018987486385477889
Our Welch:
t= -3.2860693045554346 p= 0.001898748756941293
Exercises#
Change the sample sizes (
n1,n2) and see how the standard error and power change.Try a clearly non-normal distribution (e.g. exponential) and compare the t-test to a permutation test.
Simulate unequal variances and compare Student vs Welch rejection rates.
Compute and visualize a bootstrap CI for \((\mu_1-\mu_2)\) and compare it to the t-based CI.
References#
Student (1908), The probable error of a mean
Welch (1947), The generalization of “Student’s” problem when several different population variances are involved
SciPy docs:
scipy.stats.ttest_ind